4  Scaling out

03_scaling_out

Author

Ryan Wesslen

Let’s now begin to scale out examples.

5 basic_grid_search.py

This example showcases a simple grid search in one dimension, where we try different parameters for a model and pick the one with the best results on a holdout set.

5.1 Defining the image

First, let’s build a custom image and install scikit-learn in it.

import modal

app = modal.App(
    "example-basic-grid-search",
    image=modal.Image.debian_slim().pip_install("scikit-learn~=1.2.2"),
)

5.2 The Modal function

Next, define the function. Note that we use the custom image with scikit-learn in it. We also take the hyperparameter k, which is how many nearest neighbors we use.

@app.function()
def fit_knn(k):
    from sklearn.datasets import load_digits
    from sklearn.model_selection import train_test_split
    from sklearn.neighbors import KNeighborsClassifier

    X, y = load_digits(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

    clf = KNeighborsClassifier(k)
    clf.fit(X_train, y_train)
    score = float(clf.score(X_test, y_test))
    print("k = %3d, score = %.4f" % (k, score))
    return score, k

6 fetch_stock_prices.py

TBD

7 Practice problem

Let’s test out what we’ve learned by creating a new script.

For this, we’ll use another scikit-learn tutorial (Gradient Boosting Regularization) but loop through a parameter (the sample size, n) and for each saving a matplotlib image from fetch_stock_prices.py.

This tutorial is inspired by a recent :probabl. video by Vincent Warmerdam that explored this tutorial more in detail.

7.1 Initialize the App

Let’s first name the app and create the initial image.

import io
import os

import modal

app = modal.App(
    "example-boosting-regularization",
    image=modal.Image.debian_slim()
    .pip_install("scikit-learn~=1.2.2")
    .pip_install("matplotlib~=3.9.0"),
)

We needed to install matplotlib since we’re calling it in our function.

7.2 Define function

For our function, we’ll use:

@app.function()
def fit_boosting(n):
    import matplotlib.pyplot as plt
    import numpy as np

    from sklearn import datasets, ensemble
    from sklearn.metrics import log_loss
    from sklearn.model_selection import train_test_split

    X, y = datasets.make_hastie_10_2(n_samples=n, random_state=1)

    # map labels from {-1, 1} to {0, 1}
    labels, y = np.unique(y, return_inverse=True)

    # note change from 0.8 to 0.2 test dataset
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

    original_params = {
        "n_estimators": 500,
        "max_leaf_nodes": 4,
        "max_depth": None,
        "random_state": 2,
        "min_samples_split": 5,
    }

    plt.figure()

    for label, color, setting in [
        ("No shrinkage", "orange", {"learning_rate": 1.0, "subsample": 1.0}),
        ("learning_rate=0.2", "turquoise", {"learning_rate": 0.2, "subsample": 1.0}),
        ("subsample=0.5", "blue", {"learning_rate": 1.0, "subsample": 0.5}),
        (
            "learning_rate=0.2, subsample=0.5",
            "gray",
            {"learning_rate": 0.2, "subsample": 0.5},
        ),
        (
            "learning_rate=0.2, max_features=2",
            "magenta",
            {"learning_rate": 0.2, "max_features": 2},
        ),
    ]:
        params = dict(original_params)
        params.update(setting)

        clf = ensemble.GradientBoostingClassifier(**params)
        clf.fit(X_train, y_train)

        # compute test set deviance
        test_deviance = np.zeros((params["n_estimators"],), dtype=np.float64)

        for i, y_proba in enumerate(clf.staged_predict_proba(X_test)):
            test_deviance[i] = 2 * log_loss(y_test, y_proba[:, 1])

        plt.plot(
            (np.arange(test_deviance.shape[0]) + 1)[::5],
            test_deviance[::5],
            "-",
            color=color,
            label=label,
        )

    plt.legend(loc="upper right")
    plt.xlabel("Boosting Iterations")
    plt.ylabel("Test Set Deviance")

        # Dump the chart to .png and return the bytes
    with io.BytesIO() as buf:
        plt.savefig(buf, format="png", dpi=300)
        return buf.getvalue()

This is primarily the scikit-learn demo but a few modifications like:

  • we modified the test_size from 0.8 to 0.2
  • we parameterized the sample size n, which we’ll loop through
  • we’ll return the chart, similarly from fetch_stock_prices.py
  • increased the number of boosting iterations from 400 to 500

Last, we’ll define the local_entrypoint as:

OUTPUT_DIR = "/tmp/modal"


@app.local_entrypoint()
def main():
    os.makedirs(OUTPUT_DIR, exist_ok=True)
    for n in [1000,5000,10000,20000,50000]:
        plot = fit_boosting.remote(n)
        filename = os.path.join(OUTPUT_DIR, f"boosting_{n}.png")
        print(f"saving data to {filename}")
        with open(filename, "wb") as f:
            f.write(plot)

This will end with us saving each of the images into a folder /tmp/modal.

So let’s now run this:

$ modal run boosting_regularization.py
 Initialized. View run at https://modal.com/charlotte-llm/main/apps/ap-xxxxxxxxxx
 Created objects.
├── 🔨 Created mount /modal-examples/03_scaling_out/boosting_regularization.py
└── 🔨 Created function fit_boosting.
saving data to /tmp/modal/boosting_1000.png
saving data to /tmp/modal/boosting_5000.png
saving data to /tmp/modal/boosting_10000.png
saving data to /tmp/modal/boosting_20000.png
saving data to /tmp/modal/boosting_50000.png
Stopping app - local entrypoint completed.
 App completed. View run at https://modal.com/charlotte-llm/main/apps/ap-xxxxxxxxxx

We can view a few of the images. For example, this is n = 5000:

This is particularly interesting due to the subsample = 0.5, which generally follows No shrinkage but then jumps up. It’s not clear why but a curious case.

Alternatively, let’s look at n = 10000:

Now we see a result consistent with Vincent’s video as all curves smooth out, none shrinkage learns quickly and then levels out very quickly. Even after 500 iterations no shrinkage has a lower deviance, which indicates a better out-of-sample fit.

Let’s last look at n = 50000:

Very similar curves again, but this time the gains of no shrinkage is even magnified more as up to 500 iterations there’s a larger gap between no shrinkage and shrinkage.

What’s nice about Modal is we can also view the remote logs such that:

Not surprising, our last (n = 50000) execution took the longest, taking about 4 minutes and 23 seconds. This is helpful for us to keep in mind and use these logs more as we begin to run more computationally intensive examples moving forward.